Weighting in Information Retrieval Using Genetic Programming: A Three Stage Process
نویسندگان
چکیده
This paper presents term-weighting schemes that have been evolved using genetic programming in an adhoc Information Retrieval model. We create an entire term-weighting scheme by firstly assuming that term-weighting schemes contain a global part, a term-frequency influence part and a normalisation part. By separating the problem into three distinct phases we reduce the search space and ease the analysis of the schemes generated by the process. Evolutionary computation techniques are proving to be a viable alternative to other standard analytical methods in many areas of IR. Genetic Programming (GP) [2] is an automated searching algorithm inspired by biological evolution. GP has been shown to be an effective approach to learning term-weighting schemes in IR [5]. Firstly, we evolve weighting schemes in a global domain which promote the best terms to use in distinguishing documents. Then, using a suitable global scheme, we evolve term-frequency influence schemes which uses the within-document term-frequency to correctly weight the term-frequency factor. Finally, we evolve normalisation schemes based on the best performing combined global and term-frequency scheme. This framework is an extension of work carried out in [1]. Most term-weighting schemes combine these three aspects to weight query terms and thus score a document in relation to a query.
منابع مشابه
Assessing the level of familiarity, use and also the effectiveness of mind maps in the information retrieval process
Background and Aim: Mind map is a full-color illustrated note-taking in which, main idea or subject is situated. The main ideas then branch out from the center, which are linked to the central idea. This is a relatively new topic, and slight research has been conducted to show its effectiveness worldwide. The aim is to examine the effectiveness of mind maps in the information retrieval process....
متن کاملEvolutionary Learning of Boolean Queries by Genetic Programming
The performance of an information retrieval system is usually measured in terms of two different criteria, precision and recall. This way, the optimization of any of its components is a clear example of a multiobjective problem. However, although evolutionary programming have been widely applied in the information retrieval area, in all of these applications both criteria have been combined in ...
متن کاملEffective Term Weighting for Sentence Retrieval
A well-known challenge of information retrieval is how to infer a user’s underlying information need when the input query consists of only a few keywords. Question Answering (QA) systems face an equally important but opposite challenge: given a verbose question, how can the system infer the relative importance of terms in order to differentiate the core information need from supporting context?...
متن کاملA New Algorithm for Term Weighting in Text Summarization Process
The importance of good weighting methodology in information retrieval methods – the method that affects the most useful features of a document or query representative is examined. Good weighting methodologies are supposed to be more important than the feature selection process. Weighting features is the thing that many information retrieval systems are regarding as being of minor importance as ...
متن کاملChaotic Genetic Algorithm based on Explicit Memory with a new Strategy for Updating and Retrieval of Memory in Dynamic Environments
Many of the problems considered in optimization and learning assume that solutions exist in a dynamic. Hence, algorithms are required that dynamically adapt with the problem’s conditions and search new conditions. Mostly, utilization of information from the past allows to quickly adapting changes after. This is the idea underlining the use of memory in this field, what involves key design issue...
متن کامل